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Abstract 

We propose a new approach to the theoretical analysis of Loopy Belief Propagation (LBP) 
and the Bethe free energy (BFE) by establishing a formula to connect LBP and BFE with a 
graph zeta function. The proposed approach is applicable to a wide class of models including 
multinomial and Gaussian types. The connection derives a number of new theoretical 
results on LBP and BFE. This paper focuses two of such topics. One is the analysis of the 
region where the Hessian of the Bethe free energy is positive definite, which derives the 
non-convexity of BFE for graphs with multiple cycles, and a condition of convexity on a 
restricted set. This analysis also gives a new condition for the uniqueness of the LBP fixed 
point. The other result is to clarify the relation between the local stability of a fixed point 
of LBP and local minima of the BFE, which implies, for example, that a locally stable fixed 
point of the Gaussian LBP is a local minimum of the Gaussian Bethe free energy. 

Keywords: loopy belief propagation, graphical models, Bethe free energy, graph zeta 
function, Ihara-Bass formula 



X 
J3 



1. Introduction 

Probability density functions that have "local" factorization structures, called graphical 
models, constitute fundamentals in many fields. In the fields of statistics, artificial intelli- 
gence and machine learning, for example, graphical modeling has been a powe rful tool for 
representing our prior knowled ge and modeling hidden structures of problems (jWhittakeii . 
2009 : Pear] . 1988 : Jordan . 19981 ) . Othe r examples are found in statistical physics, coding the 



ory, a nd combinatorial optimizations (jPelizzolal . l2005l : iMcEliece et al.l . Il998l : iMezard et al 



2OO2I ) . Typically, such probability distributions are derived from random variables that only 
have local interactions/constraints. This factorization structure is clearly visualized by a 
graph, called factor graph. 

Since the inference problems on graphical models, such as computation of marginal/conditional 
density functions and partition functions, are in general intractable for large graphs, Loopy 
Belief Propagation (LBP) has been proposed as an efficient approximation method applica- 
ble to any graph- structured d ensity functions. Originally, Belief Propagation (BP) algorithm 
was proposed by Pearll ( 19881 ) to compute exactly the marginals for tree-structured graphical 
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models. This algorithm passes "messages" between vertices of the graph until all informa- 
tion of the graphical model is distributed throughout the graph. Some researchers have 
found that LBP, an extended us e of BP for graphs with cycles, show s good approximation 
with high potential applicability ( Murphy et al. . 19991 : McEliece et al. . 1998). After the pro- 



posal, many extensions a.nd va riants have been studied (jYedidia et al.l . l200ll : ISudderth et al 



2OO2I : Iwainwright et al.l . I2OO5I I and have been applied successfully to many problems, includ- 



i ng coding t heory , image processing, sensor network localization and compressive sensing 



(jlhler et all . l2005l : iBaron et al.1 . 1201 



a: 



On the theoretical side, a significant number of studies have been carried out by many 
authors in this decade. One theoretical challenge of LBP is that the algorithm may have 
many f i xed p oints; the uniqueness is generally guaranteed only for trees and one-cycle graphs 
(|Weissl . [2OO0l l. The LBP fixed points are the solutions of a nonlinear equation associated 
with the graph, and the structure of the equation is more complicated as the number of 
cycle is larger. Regarding this problem, a notable result is the variational interpretation 
of LBP; it shows that the LBP fixed points are the local minima of the Bethe free energy 



(jYedidia et al.l . bonil . lionil v This sug gests that the behavior of LBP is more complex with 



non-convexity of the Bethe free energy. Another difficulty of LBP is that the algorithm 
does not necessarily converge and sometimes shows oscillat ory behaviors. Con c ernin g the 
multinomia l mod el (also known as discrete variable model), Mooi] and Kappen ( 200?! ) and 
Ihler et al.l tood ) give sufficient conditions for the convergence in terms o f the spectral 



radius of a certain matrix related to the graph. iTatikonda and Jordan! (j2002l ) also derive a 
sufficient condition for convergence, interpreting the convergence as the uniqueness of the 
Gibbs measure on the universal covering tree. 

The purpose of this paper is to provide a novel discrete geometric approach to analysis of 
the LBP algorithm. The starting point of our study is a question: "How are the behaviors 
of the LBP algorithm affected by the geometry of the graph?" If the graph is a tree 
(L)BP works nicely; it terminates in a finite step at the unique fixed point and gives 
the exact marginals. If the graph has only one cycle it also works appropriately; the 
algorithm converges to the unique fixed point and finds th e MPM (Maximum Posterior 
Marginal) assignment in binary variable c ases (IWeisd. l200d) . Additionally , the Bethe free 
energy function is convex in these cases ( Pakzad and Anantharainl . 20021 ). Existence of 
multiple cycles, however, breaks down these nice properties. There have not been many 
researches that elucidate the effects of cycles on LBP in detail beyo nd "tree or no n -tree" 
clas sification. While a nota ble exception is the walk-sum analysis by Johnson et al. ( 20061 ) 
and Malioutov et al. (j2006l ). it is limited to the Gaussian case. 

This paper proposes a method based on a new connection between LBP, Beth e free 



energy , 
WW, 



and a graph zeta function. Graph zeta functions, originally introduced by llhara 



are popular graph characteristics defined by the products over the prime cycles. We 



capture the effects of cycles on LBP and Bethe free energy by establishing a novel formula, 
called Bethe-zeta formula, which connects the Hessian of the Bethe free energy with the 
graph zeta function. To derive the formul a, we extend the definition of e xistin g graph zeta 
functions and related Ihara-Bass formula ( Stark and Terras . Il99(il : iBassl . Il992l ). 

Our discovery of the connection, including the Bethe-zeta formula, derives new ways 
of analyzing LBP and the Bethe free energy function taking the graph geometry into ac- 
count. It is applicable to a wide class of graphical models defined by "marginally closed" 
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exponential families, which include multinomial and Gaussian models. This paper discusses 
two examples of such analysis: one is the positive definiteness of Hessian for the Bethe free 
energy, and the other is local stability of the LBP dynamics. 

First, based on the connection, we derive conditions that the Hessian of Bethe free en- 
ergy function is positive definite. As already discussed, analysis of the Bethe free energy is 
important for theoretical understanding of the complex behavior of LBP. As the fundamen- 
tals, we consider the local properties of the Bethe free energy by elucidating the positive 
definiteness of its Hessian, while there ar e many studies on mod i fications and convexifica- 
tions of the Bethe free e nergy function ( Wiegerinck and Heskes . 20031 : Wainwright et al. 



2003bl : I Weiss et al.l . 120071 ) . The direct consequence of our analysis is a sufficient condition 
of the uniqueness of the LBP fixed point, which is derived by giving a condition of global 
convexity. In discussing the positive definiteness, we consider two defining domains of the 
Bethe free energy: one is given by the locally consistent pseudomarginals, and the other is 
a more restricted set conditioned by the compatibility functions of given graphical model. 
The beliefs given by LBP always lie in the latter domain. We show that, when considered 
in the former domain, the necessary and sufficient condition for the Hessian to be positive 
definite is that the underlying factor graph has no more than one cycle. We also give a 
sufficient condition of the convexity of Bethe free energy on the latter domain, which implies 
the uniqueness of the LBP fixed point. By numerical examples, we de monstrate that our 



new u niqueness condition covers a wider region than the one given by lMooij and Kappen 
(|2007l ) or the examples. 



In the second application, we clarify a relation between the local structure of the Bethe 
free energy function and the local stability of a LBP fixed point. Such a relation is not 
necessarily obvious, since LBP is not derived as the gradien t desc ent of the Bethe free 
energy. In this line of studies, for multinomial models Inesked tooi ) shows that a locally 



stable fixed point of LBP is a local minimum of the Bethe free energy. We give conditions 
of the local stability of LBP and the positive definiteness of the Bethe free energy in terms 
of the eigenvalues of a matrix that appears in the graph zeta function. As a consequence, 
the result by Heskes is extended to a wider class including Gaussian distributions. 

This paper is organized as follows. In section 2, we introduce graphical models, LBP 
and the Bethe free energy as preliminaries. We formulate the setting in terms of exponen- 
tial families. Section 3 includes the definition of a new class of graph zeta function, the 
extension of Ihara-Bass formula, and related results. Using these results. Section 4 shows 
the fundamental results of this paper, Bethe-zeta formula and positive definiteness condi- 
tion, in Theorems [11] and O Section 5 derives a positive-definite region of the Bethe free 
energy function, and discusses convexity. In section 6, we elucidate the relations between 
the stability of LBP and the local structure of the Bethe free energy at LBP fixed points. 
Section 7 includes discussion and concluding remarks. Proofs omitted from the main body 
of the paper are given in the appendices. 

2. Preliminaries 

In this section we summarize a background of graphical models and LBP. In Subsection 12. II 
we introduce graphical models in terms of hypergraphs. Subsection 12.21 introduces LBP 
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Figure 1: Directed graph representation. 



Figure 2: Bipartite graph representation. 



algorithm. The Bethe free energy, which provides alternative language for formulating LBP 
algorithm, is discussed in Subsection 12. 3[ 

2.1 Graphical models 

We begin with basic definitions of hypergraphs because the associated structures with graph- 
ical models are, precisely speaking, hypergraphs. 

An ordinary graph G = {V, E) consists of the vertex set V joined by edges of E. Gener- 
alizing the notion of graphs, hypergraphs are defined as follows. A hypergraph H = (V, F) 
consists of a set of vertices V and a set of hyperedges F. A hyperedge is a non-empty subset 
of V. For any vertex i £ V, the neighbors of i is defined by Ni := {a £ F\i G a}. Similarly, 
for any hyperedge a £ F, the neighbors of a is defined by Na ■= {i G V\i £ a} = a. The 
degrees of i and a are given by di := lA'jl and da ■= \Na\ = \a\, respectively. If all the 
degrees of hyperedges are two, then the hypergraph is naturally identified with an ordinary 
graph. 

In order to describe the message passing algorithm in Subsection 12.2.21 it is conve- 
nient to identify a relation i G a with a directed edge a — >• i. For example, let H = 
({1, 2, 3, 4}, {ai , a2, 03}), where ai = {1,2}, 02 = {1,2,3,4} and 03 = {4}; this hyper- 
graph is shown as a directed graph in Fig. [TJ Explicitly writing the set of directed edges 
E, a hypergraph H is also denoted hy H = {V U F,E). Note that, forgetting the edge 
directions, H is also represented as a bipartite graph (Fig. [2]). 

We define basic notions of hypergraphs via its corresponding bipartite graphs. A hyper- 
graph H is connected (resp. tree) if the corresponding bipartite graph is connected (resp. 
tree). In the same way, the number of connected components (resp. nullity) of H is defined 
and denoted by k{H) (resp. n{H)). Therefore, n{H) := \V\ + \F\ — \E\ and a hypergraph 
H \s a, tree if and only if n{H) = and k{H) = 1. 

Our primary interest is probability density functions that have factorization structures 
represented by hypergraphs. In such situations, a hypergraph is often referred to as a factor 
graph and a hyperedge as a factor. 

Definition 1 Let H = (F, F) be a hypergraph. For each i £ V, let Xi be a variable that 
takes values in a set Afj. A probability density function p on x = (xj)jgy is said to be 
graphically factorized with respect to H if it has the following factorized form 



(1) 



4 



LBP, BFE AND Graph Zeta Function 



where Xq = {xi)i^a, Z is the normahzation constant and are positive valued functions 
called compatibility functions. A set of compatibility functions, giving a graphically factor- 
ized density function, is called a graphical model. The associated hypergraph H is called 
the factor graph of the graphical model. 



Factor graphs are introduced by Kschischang et al. ( 200ll ). Any probability density func- 



tion on X = riigy is trivially graphically factorized with respect to the "one-factor 
hypergraph", where the unique factor includes all vertices. It is more informative if the fac- 
torization involves factors of small size. Our implicit assumption throughout this paper is 
that for all factors a, Xa = Hi '^i ^'^^ small enough, in the sense of cardinality or dimension, 
to be handled efficiently by computers. 

2.2 Loopy Belief Propagation algorithm 

Given a graphical model, our task is to solve inference problem such as computation of 
marginal/conditional density functions and the partition function. Belief Propagation (BP) 
efficiently computes the exact marginals of a joint distribution that is factorized according 
to a tree-structured factor graph; Loopy Belief Propagation (LBP) is a heuristic application 
of the algorithm for factor graphs with cycles, showing successful performance in various 
problems. 

First, in Subsection 12.2.11 we introduce a collection of exponential families called infer- 
ence family to formulate the LBP algorithm. In order to perform inferences using LBP, we 
have to fix an inference family t hat "includes" the given g raphical model. Our formulation 



is a variant of the approach by IWainwright et al.l (l2003al ) , where over-complete sufficient 



statistics are exploited. The detail of the LBP algorithm is described in Subsections 12.2.21 
2.2.1 Exponential families and Inference family 

To clarify notations, here we summarize basic facts on exponential families. Let {X,B, u) be 

a measure space. For given n real valued functions {sufficient statistics) (p{x) = ((/)i(x), . . . , (pn{x)), 

an exponential family is given by 



p{x; d) = exp (j2 GiUx) - ^(^)^ , ^{0) := log j exp ^ i 



i(t)i{x) du{x). 



The natural parameter, 9, ranges over the set e := int{6' G R^;'ip{0) < oo}, where int 
denotes the interior of the set. The function ^l^{9) is called the log partition function. We 
always assume that the Hessian of this function (i.e. the covariance matrix) is invertible. 
The derivative of the log partition function gives a bijective map 

A:Q3e^ ^{9) = Ep^lct^] e Y := A(G) 

and this alternative parameter ry = ^{9) is called expectation parameter. The inverse of this 
map is given by the derivative of the Legendre transform (/?(//) = supgfzQ{J2i ^i'^i ~ V'(^)) = 

%-i(^)[log^'A-Hr?)]- 
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Example 1 [Multinomial distributions] Let X = {0, . . . , iV — 1} be a finite set with the 
uniform base measure. One way of taking sufficient statistics is 

, , , I 1 if X = A; 

M^) = {^ . (2) 

I (J otherwise 

for A; = 0, . . . , — 2. Then the given exponential family is called multinomial distributions 
and coincide with the all probability density functions on X that has positive probabilities 
for all elements of X. The region of natural parameters is = M^^^ and the of expectation 
parameters is the interior of the probability simplex. That is, Y = {{yi, . . . , un)'-, Ylik=i Vk = 
> 0}. 

Example 2 [Gaussian distributions] Let X = M" with the Lebesgue measure and The 
exponential family given by the sufficient statistics (j){x) = {xi, XjXk)i<i<n,i<j<k<ny is called 
Gaussian distributions, consists of probability density functions of the form 

p{x; 6) = exp ( ^ 9ijXiXj + ^ 9iXi - ipiO)) . 

i<j i 

Example 3 [Fixed- mean Gaussian distributions] For a given mean vector /x = {m), the 
fixed-mean Gaussian distributions is the exponential family obtained by the sufficient statis- 
tics 4){x) = {{Xi - Hi)ixj - IJ'j)}l<i<j<n- 

Here and below, we construct a set of exponential families. In order to perform inferences 
using LBP for a given graphical model, we have to fix a "family" that includes the given 
probability density function. 

Let H = (y, F) be a hypergraph. First, for each vertex i, we consider an exponential 
family £i with a sufficient statistic (pi and a base measure Ui on Xi. A natural parameter, 
expectation parameter, the log partition function and its Legendre transform are denoted 
by di, rji, ipi and ifi respectively. Secondly, for each factor a = {^i, . . . , i^^}, we give an 
exponential family £a on Xa = Hiea with the base measure Ua = Higa '^i ^ sufficient 
statistic (f)a of the form 

(kaiXa) = {(l){a){Xa),(t>h{xi,), . . . ,4>id^{Xi^J). (3) 

An important point is that (pa includes the sufficient statistics for i G a as its components 
in addition to (f)(^a) indexed by a G -F. The natural parameter, expectation parameter, log 
partition function and its Legendre transform are denoted by 

= id(^a),^a:h, ■ ■ ■ ,Oa:ij^J £ Qa, Va = irj{a) , Vaih, ■ ■ ■ , Va.i^J ^ ^a, -00: and (/9a- (4) 

The following assumption is indispensable to our analysis: 

Assumption 1 For all i £ V and a £ F , we assume that the Hessian of the log partition 
functions , tpi and 'ipa, (i-e. the covariance matrix) are invertible in the parameter spaces. 

In order to use these exponential families £a and £i for LBP, we need another assump- 
tion: the family is "closed" under marginalization operati on. This type of c ondition on 



exponential families is also considered in other litterateurs (|Mardia et al.l . [200^) . 
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Assumption 2 (Marginally closed assumption) For all pair of i £ a, 

j p{xa)<iva^i{xa~.,i) £ £i for all p e £a- (5) 

Definition 2 If a collection of the exponential families I := given by sufficient 

statistics {(p(^a){^a), 4'i{xi))a€F,i<^v ^s above satisfies Assumptions [1] and [H it is called an 
inference family associated with a hypergraph H. An inference family is called pairwise if 
the associated hypergraph is a graph. 

An inference family has a parameter set 6 = Qa x Hi ®«5 which is bijectively mapped 
to the dual parameter set Y = Ya x Yi by the maps of respective components. An 
inference family naturally defines an exponential family on = Y\i of the sufficient 
statistic {(j)(^a){xa),4'i{xi))a£F,iev- We denote it by £{!)■ 

Example 4 [Binary pairwise inference family] Consider the case that a graph G = {V, E) is 
the factor graph. For each z G y, we define an exponential family £i on Xi = {0, 1} defined 
by 4>i{xi) = Xi. For each {i,j} € E, we also define multinomial exponential family £{ij} 
on {0,1}^ by (j)[ijj{xi, xj) = {xi ), where (p(^ij^{xi) = XiXj. Then these exponential 

family gives an inference family since Assumption [2] is trivially satisfied. 

Example 5 [Multinomial inference family] Let £i be an exponential family of multinomial 
distributions. Choosing functions (p(^a}{^a), we can make the £a being multinomial distri- 
butions on Xa] more precisely, we choose (p(a)ixa) so that the components of 0a (xq,), which 
are regarded as Y\ - \Xi\ dimensional vectors, are linearly independent. Then we obtain an 
inference family called a multinomial inference family. 

Example 6 [Gaussian inference family] We consider the cas^ that Afj = R . For Gaussian 
case, given a factor graph H = {V,F), the sufficient statistics are given by 

Then the inference family is called Gaussian inference family. Assumption [2] is satisfied 
because a marginal of a Gaussian density function is a Gaussian density function. Fixed- 
mean inference family is analogously defined by (j)i{xi) = (xj — Hi)'^ and 0(Q,)(xct) = ((xj — 
fJ'i){xj — fij))ij^a^i^j. Usually, for Gaussian cases, the factor graph H is a graph rather than 
hypergraphs; thus, we only consider Gaussian inference families on graphs. □ 

2.2.2 LBP ALGORITHM 

The LBP algorithm calculates the approximate marginals of a given graphical model ^ = 
{^a} using the inference family inference family I. We always assume that the inference 
family includes the given probability density function: 

Assumption 3 For every factor a £ F, there exists 9a s.t. 

^a(Xa) = exp ((^a, (/)q,(Xq,))) . (6) 

1. Extensions to high dimensional case, i.e. Xi = R""', is straight forward. 

2. Extensions to the cases of hypergraphs are also straightforward. 
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Figure 3: The blue messages contribute to the red message at the next time step. 



This is equivalent to the assumption 

Pix) = ^ll^aixa)&£il) (7) 

a 

up to trivial re-scaling of which does not affect LB P algorithm. 



The procedures of the LBP algorithm is as follows ()Kschischang et al.l . l200ll ) . For each 
pair of a vertex i £ V and a factor a £ F satisfying i £ a, an initialized message is given in 
the form of 

ml^iixi) = exp{{nl^i,(f)i{xi))), (8) 

where the choice of fJ-^^i arbitrary. The set {m^^-} or is called an initialization 

of the LBP algorithm. At each time t, the messages are updated by the following rule: 



liixi) =UJ j ^aiXa) n n 'm^l^^jiXj) dl^a^iiXa^i) (t > 0), (9) 

where w is a certain scaling constantH See Fig [3] for the illustration of this message update 
scheme. From Assumptions [2] and [3l the messages keep the form of Eq. ([8]). 

Since this update rule simultaneously generates all messages of time t + 1 by those of 
time t, it is called a parallel update. Another possibility of the update is a sequential update, 
where, at each time step, one message is chosen according to some prescribed or random 
order of directed edges. In this paper, we mainly discuss the parallel update. 

We repeat the update Eq. ([9]) until the messages converge to a fixed point, though 
this procedure is not guaranteed to converge. Indeed, it sometimes exhibits oscillatory 
behaviors. The set of LBP fixed points does not depend on the choices of the update rule, 
but converging behavior, or dynamics, does depend on the choices. 

If the algorithm converges, we obtain the fixed point messages {m*^^} and beliefs that 
are defined by 

:= a; (10) 



3. Here and below, we do not care about the integrability problem. For multinomial and Gaussian cases, 
there are no problems. 
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where uj denotes (not necessarily the same) normahzation constants that require 

j hi{xi)dui = 1 and j ha{xa)dva = 1- (12) 
Note that behefs automatically satisfy the conditions ha{xa) > and 

ba{Xa)dUa-^i{Xa-^i) = bi{Xi). (13) 



The beliefs are used for approximation of the true marginal density functions. 

If is a tree, the LBP algorithm stops at most \E\ updates and the computed beliefs 
are equal to the exact marginals of the given density function. 

2.3 Bethe free energy and characterization of LBP fixed points 



The Bethe approxim ation was initia.ted b y iBethd (|l935l ) and was found to be essentially 



equivalent to LBP bv lYedidia et al.1 (|200lh . The modern for mulation fo r presenting the ap 



proximation is a variational problem of the Bethe free energy (jAnl . Il988l ) . In this subsection, 
we summarize these facts in our settings. 

First, we should introduce the Gibbs free energy function because the Bethe free energy 
function is a computationally tractable approximation of the Gibbs free energy function. 
For given graphical model ^ = {^q}, the Gibbs free energy FQihhs is a convex function over 
the set of probability distributions p on x = (xj)jgy defined by 

F«»,(p) = /pWlog(j^-fL)d.W, (14) 

where u = Hjey base measure on X = Higy '^i- Using Kullback-Leibler divergence 

D{q\\p) = f j51og(g/p), Eq. ([T3D comes to Fcibbsip) = D{p\\p) — log Z. Therefore, the exact 
density function Eq. ([1]) is characterized by a variational problem 

p{x) = argmin Fcibbsip), (15) 
p 

where the minimum is taken over all probability distributions on x. As suggested from the 
name of "free energy", the minimum value of this function is equal to — log Z. 

In many cases including discrete variables, computing values of the Gibbs free energy 
function is intractable in general because the integral in Eq. (I14p is indeed a sum over 

= states. We introduce functions called Bethe free energy that does not include 

such an exponential number of state sum. 

Definition 3 The Bethe free energy (BFE) function is a function of expectation parame- 
ters. For a given inference family X, define L{I) := {rj = {r)a,rii} G Y\r]a:i = r]i ^(z € a)} 0. 
On this set, the Bethe free energy function is defined by 

aeF aeF ieV 

where 6a is the natural parameter of in Eq. ([6]). 



4. We often write L{T) as L when I is obvious from the context. Since Y — Yl^ x Y\i is convex, 
L is a convex set. If the inference fa mily is multinomial, the closure of this set is called local polytope 
l|Wainwright and Jordanl. |2008| . [2003h . 
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An expectation parameter specifies a probability density function in the exponential family. 
Thus, T] £ Y specifies {baixa),bi{xi)}a<^F,i€Vi where 6q,(xq,) G £a and bi{xi) G The 
constraint Tja-.i = Vi means that 

J (pi{Xi)ba{Xa)diya = J (t>i{Xi)bi{Xi)Ui. 

Under Assumption [3l this condition is equivalent to J ba{xa)di'a-^i = bi{xi) because a 
probability density function in £i is specified by the expectation of 4>i{xi). An element of L 
is called a set of pseudomarginals. Therefore, we have the following identification 

L = ^{baiXa),bi{Xi)}aeF,iev\ baiXa) G Sa, bi{xi) £ £i and j ba{Xa)dVa-^i = 

The second condition is called local consistency. Under this identification, the Bethe free 
energy function is 

F{{ba{Xa),bi{xi)]) = / bo,{Xa)\og'^a{Xa)dVoc + ^^ / 6a (a^a) log 6q (Xa)dl/a 

+ / bi{xi)\ogbi{xi)dui. 

i&V ■' 

If iif is a tree, the variational problem of the Bethe free energy over L is equivalen t to 
that of the Gibbs free energy in the following sense. See IWainwright and Jordan for 



more details. First, it can be shown that, for any {^^(xq,), G L, 

Ii{{ba{xa)A{xi)}) ■.= Wb^{xo^)Wbi{xif-''^ (17) 

a i 

is a probability density function because it is summed up to one. For these type of density 
functions, we can see that the Gibbs free energy function is equal to the Bethe free energy 
function: F = Fcibbs ° n. Secondly, it is also known that the true density function p for a 
tree has the factorization of the form Eq. (jl7p . Therefore, the variational problem Eq. (jl5p 
reduces to that of the Bethe free energy function over L. 

For general factor graphs, the Bethe variational problem approximates the Gibbs vari- 
ational problem and a minimizer of the Bethe problem can be us e d to approximate the 
marginal density function. As shown by Pakzad and Anantharam ( 20021 ). the Bethe free 



energy function is convex if the factor graph has at most one cycle. Therefore, the mini- 
mization of the Bethe free energy is easy for these cases. In general, however, the convexity 
of F is broken as the nullity of the underlying factor graph becomes large, yielding multiple 
minima. Though the functions (pa and (pi are convex, the negative coefficients (1 — dj) makes 
the function F complex. The positive-definiteness of the Hessian of the Bethe free energy 
will be analyzed in Section H] and [5l 

The Bethe free energy funct ion gives an alternati ve description of the LBP fixed points. 



The following fact is shown by lYedidia et al.l (|200ll ) ; LBP finds a stationary point of the 



Bethe free energy function, which is a necessary condition of the minimality. We give the 
proof in our term in Appendix IA.2I 
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Theorem 4 LetX he an inference family and ^ = {'^a} be a graphical model. The following 
sets are naturally identified each other. 

1. The set of fixed points of loopy belief propagation. 

2. The set of stationary points of F over L{X). 

3. Graph zeta function 

The aim of this section is to introduce the graph zeta function and develop some results, 
which are used in the later sections. 

Ihara's graph zeta function was originally introduced by Y. Ihara ( 19661 ) for a certain 



algebrai c object, and was abst r acted and extended to be defined on arbitrary finite graphs 
by J. P. ISerrj (|l980l ). ISunadal (|l986l ) and [Bad (|l992l ). The edge zeta function is a multi- 



variable generalizati on of Ihara's grap h zeta function, allowing arbitrary scalar weight for 
each directed edge ( Stark and Terraa . 19961 ). Extending those graph zeta functions, we 



introduce a graph zeta function defined on hypergraphs with matrix weights. 

The central result of this section is the Ihara-Bass type determinant formula in Sub- 
section 13.21 This formula plays an important role in deriving the positive definiteness 
condition in Subsection 13.41 These results are utilized to establish the relations between 
this zeta function and the LBP algorithm in the next section. 

3.1 Definition of the graph zeta function 

In the first part of this subsection, we further introduce basic definitions and notations of 
hypergraphs required for the definition of our graph zeta function. 

Let H = iy, F) be a hypergraph. As noted before, it can be regarded as a directed graph 
H = iy U F,E). For each edge e = {a ^ i) £ E, s{e) = a £ F is the starting hyperedge 
of e and t{e) = i £ V is the terminus vertex of e. If two edges e,e' £ E satisfy conditions 
t{e) £ s(e') and t{e) 7^ t{e'), this pair is denoted by e ^ e' . (See Figured!) A sequence 
of directed edges (ei, . . . ,6^) is said to be a closed geodesic if ei e/+i for / £ TLjkTL. For 
a closed geodesic c, we may form the m-multiple c"^ by repeating c m-times. If c is not 
a multiple of strictly shorter closed geodesic, c is said to be prime. For example, a closed 
geodesic c = (61,62,63,61,62,63) is not prime because c = (61,62,63)^. A closed geodesic 
c = (61,62,63,64,61,62,63) is prime because it is not c 7^ c'™ for any c' and m(> 2). Two 
closed geodesies are said to be equivalent if one is obtained by cyclic permutation of the 
other. For example, closed geodesies (61, 62, 63), (62, 63, 61) and (63,61,62) are equivalent. 
An equivalence class of a prime closed geodesic is called a prime cycle. The set of prime 
cycles of H is denoted by 

If is a graph (i. e . d„ = 2 for all a £ F), these definitions reduce to standard definitions 
(jKotani and Suna 3,[2Q03)- (We win explicitly give them in Subsection 13.31 ) In this case. 



a factor a = {i,j} is identified with an undirected edge ij and (a — >• i) is identified with a 
directed edge {i ^ j). 

Usually, in graph theory, Ihara's graph zeta function is a uni-variate function and as- 
sociated with a graph. Our graph zeta function is much more involved: it is defined on a 
hypergraph having weights of matrices. To define matrix weights, we have to prescribe its 
sizes; we associate a positive integer rg with each edge e £ E. 
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Figure 4: Example of the relation e ^ e'. 



Here are additional notations used in the following definition. The set of functions □ on 
E that take values on C'^ for each e G E is denoted by X{E). The set of rii x n2 complex 
matrices is denoted by M (711,722). 



Definition 5 Assume that for each e' ^ e, a matrix weight u^'^e £ M(re, re') is associated. 
For this matrix weights u = {u^i^f,}, the graph zeta function of H is defined by 



where 7r(p) := lie^^ei • • • Ue^^e^Ue^^ei for p = (ei, . . . , Cfc) 



Since det(/„ — AB) = det{Im — BA) for n x m and m x n matrices A and B, det(/ — 7r(p)) 
is well defined for an equivalence class p. The definition is an analogue of the Euler product 
formula of the Riemann zeta function which is represented by the product over all the prime 
numbers. 



If H is a graph and 



1 for all e € E, this zeta function reduces to the edge zeta 



function bv lstarkandTerraj H). If in addition all these scalar weights are set to be 
equal, i.e. ite'^e = u, the zeta function reduces to the Ihara zeta function. These reductions 
will be discussed in Subsection 13. 3[ Moreover, for general hypergraphs, we obtain the one- 
variable hvpe rgraph zeta function by setting all matrix weights to be the same scalar u 
(jStorml . liooel '). 



Example 7 Ch{u) = 1 if is a tree. For 1-cycle graph C^v of length N, the prime cycles 
are (ei, 62, . . . , bat) and (cat, ejy-i, • • • , ei). (See Figure [5l) The zeta function is 

CCn{'^) = det(/r^^ J^eftT^ei • • ■ -^63 Uei ) det(/r^^ Ug-^^^^ . . . Ug^ _ ^ _ig^ 2 Ugj^ _Ag^ _ ^ ) 

Except for the above two types of hypergraphs, the number of prime cycles is infinite. 
Therefore, rigorously speaking, we have to care about the convergence of the product and 
restrict the definition for sufficiently small matrix weights u. However, as we will see below, 
the zeta function has a determinant formula and is well defined on the whole space of matrix 
weights. The proof is given in Appendix IA.2[ 

5. In mathematical usage, this is not a "function" because it takes a value on a different set for each 
argument e £ E. However, we do not stick this point. 
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ei 




63 



Figure 5: C3 and its prime cycles. 



Theorem 6 (The first determinant formula of zeta function) We define a linear op- 
erator Miu) : X{E) X{E) by 

M{u)f{e)= J2 «e'-^e/(e') f€X{E). 

Then, the following formula holds 

Cciu)-^ = det{I-M{u)). 

This type of determinant formula is well known in the context of gra ph zeta functions; 
i n fac t this theorem is a straightforward generalization of Theorem 3 of IStark and Terras 
^n the next section we derive a new determinant formula of the zeta function by 
manipulating the matrix Ai{u) in the above determinant. 

Note that the matrix representation of the operator Ai{u) is 



M(u) 




He' ^e 
otherwise. 



The simplification of this ma trix obtained by s e tting re = 1 and n = 1 is cal l ed dir ected edge 
matrix and denoted by M ( Stark and Terrai . 19961 ). Kotani and Sunada call this 

matrix a Perron- Frobenius operator. A noteworthy difference, in our and their definitions, 
is that directions of edges are opposite, because we choose the directions to be consistent 
with illustrations of the LBP algorithm. 



3.2 Determinant formula of Ihara-Bass type 

In the previous subsection, we have shown that the zeta function is expressed as a de- 
terminant of size X^gg^T'e- In this subsection, we show another determinant expression 
with additional assumptions on the matrix weights. The formula is called Ihara-Bass type 
determinant formula and plays a key role in proofs of Theorem [10] and Theorem [TTJ 

In the rest of this subsection, we fix a set of positive integers {ri\i(^v associated with ver- 
tices. Let {uf_^j}a<^F,i,j&a be a set of matrices uf_^j G M(rj, rj,). Our additional assumption 
on the set of matrix weights, which is the argument of the zeta function, is that 

re:=rt(e) and Ue'^e := <(J)^t(e)- 

Then the graph zeta function can be seen as a function of u = {uf_^j}. With slight abuse of 
notation, it is also denoted by C,h{u). Later in Section IU corresponds to the dimension 
of the sufficient statistic (fti, and uf_^, to a matrix Var^^- [</)j]~^Covb^ [</)j, 
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t{e') 



Figure 6: Illustration for the definition of l{u). 



To state the Ihara-Bass type determinant formula, we introduce a linear operator l{u) 
X{E) X{E) defined by 



pi. s(e') = s(e) 
■ t(e')7^t(e) 



The matrix representation of l{u) is a block diagonal matrix because it acts on each factor 
separately. Therefore / + l{u) is also a block diagonal matrix. Each block is indexed by 
a £ F and denoted by Ua- Thus, for a = {ii, . . . , z^^}. 



l2— ^'tl 



(18) 



We also define wf_^j by the elements of Wa = 11^^: 



wr 



(19) 



Similar to the definition of X{E) in Subsection l3.lt we define X{V) as the set of functions 
on V that takes value on C"* for each i £ V. 



Theorem 7 (Determinant formula of Ihara-Bass type) Let D beW are linear trans- 
forms on X{V) defined by 



{Vg){i) := d,g{i), {Wgm := Yl <iJ)^^9{t{e')) 



e,e.'^E 
t(e) — i ,s(e) — s{e') 



Then, we have the following formula 



(,g{u)-^ = det {Ir^ - P + W) J] det 



where ry := Zigy '^i 

The proof is given in Appendix IA.2I 



(20) 
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3.3 Ihara-Bass type determinant formula on ordinary graphs 

In this subsection, we explicitly write definitions and the above formula for better under- 
standing. A hypergraph H = (V, F), which has only hyperedges of degrees two, is naturally 
identified with an (undirected) graph Gh = iy,E). In the next section, we see that this 
case corresponds to the pairwise inference family. 

First, we define the zeta function Zq of a graph G = {V,E). For each undirected edge, 
we make a pair of oppositely directed edges, which form a set of directed edges E. Thus 
\E\ = 2\E\. For each directed edge e G E, o(e) E F is the origin of e and t{e) E F is 
the terminus of e. For e £ E, the inverse edge is denoted by e, and the corresponding 
undirected edge by [e] = [e] £ E. 

A closed geodesic in G is a sequence (ei,...,efc) of directed edges such that t{ei) = 
o(ej+i),ej 7^ ej+i for i E Prime cycles are defined in the same manner to that of 

hypergraphs. The set of prime cycles is denoted by ^g- 

Definition 8 Let G = {V,E) a graph. For given positive integers {ri}i^v and matrix 
weights u = {uelgg^ with Ue E M(rj(e), ro(e)). 



Zciu) := n det(l-7r(p))-\ 7r(p) := 



Up,, ■■■u, 



for p = (ei, . . .,ek). 



Since naturally identified with '^h, Zq^ = C,h hold s. This zeta func t ion is the 

matrix weight extension of the edge zeta function analyzed by IStark and Terras 

Since the degree of every hyperedge is equal to two for a graph, the matrix Wa defined 
in Eq. ()19p has explicit expressions. Using this fact, we obtain the following simplification 
of Theorem [71 



Corollary 9 For a graph G = {V, E), 

Zg{u)-^ = + V{u) - A{u)) J| det(/ - UeUg), 



(21) 



where D and A are defined by 

{V{u)g){i) :-- 



e:t(e)=i 



UeUe) ^UeUe)g{l), 



Proof For e = (i 



{A{u)g){{) := 2^ {Ir^-UeUe 
e:t(e)=i 

j), the [/[e] block is given by 

Uu = 



~'^Ueg(o{e)). 



Therefore det U^gj = det(/ri — UeUg) and the inverse VF[e] is 



UpUp 





UpUey^ 



-Up 



(22) 
(23) 
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Plugging these equations into Theorem [TJ we obtain the assertion. 



Mizuno and Sato ( 20041 ) : Horton et al. ( 20081 ) have derived a weighted graph version of 
Ihara-Bass type determinant formula under assumption that the scalar weights {ue} satisfy 
conditions UeUg = ■ In this case, the factors (1 — ngiig)"^ in Eqs. (|22|23p do not depend on 
e and Eq. (I2ip is further simplified. Corollary [9] gives the extension of the result to graphs 
with arbitrary weights. A direct proof of Corollary [9l without d i scussi ng hypergraphs, is 
found in the supplementary material of Watanabe and Fukumizu (j200 9^) . 

If all the weights are set to the result reduces to the following formula known as 
Ihara-Bass formula: 

i^)^^^-^^^det{I-uA + u^{V-I)), 



Zg{u) 



1 



where V is the degree matrix defined by Pjj = di5ij, and A is the adjacency matrix by 



A, 



'l if {iJ}eE 
otherwise. 



Many a uthors have discus sed the proof of th e Ihar a- 
given bv lil^ (ll992l V See iKotani and Sunadal t00(h: 



A combinatorial proof is given by iFoata and Zeilbergerl (j 19991 ) . 



3ass formula. T h e firs t proof was 
Stark and Terrai (jl996 ) for others. 



3.4 Positive definiteness condition 

The Ihara-Bass type determinant formula relates the matrices A4{u) and {Ir^ —T> + W). 
In the later sections, we see that M{u) corresponds to the derivative of the LBP update 
and {Iry —T> + W) is closely related to the Hessian of the Bethe free energy function. 
The following theorem is fundamental to prove Theorem I14[ 



Theorem 10 Assume that u = {uf^j}a&F,i,j<^a ^i^] 
where \\ ■ \\ is an arbitrary operator norm. If Spec{Ai{u)) C C \ M>i, where Spec(-) 
denotes the set of eigenvalues, then [I^y — T> + W) is a positive definite matrix. 

Proof From the assumption of symmetry, Wa in Eq. (I19p is a symmetric matrix. There- 
fore, W, defined in Eq. (j20p . is also symmetric. To prove the positive definiteness, we 



satisfies n? 



u 



and 



< 1, 



define uf_^j{t) := tuf_^j {t £ [0, 1]), which implies Ai{u(t)) = tAi{u). From the assump- 
tion, Ua{t) is invertible and thus Wait) is well defined for all t. If t = 0, Wa{0) = V 
and I — V + W(0) = / is obviously positive definite. Since the eigenvalues of a symmet- 
ric matrix are real and continuous with respect to its entries, it is enough to prove that 
det(/ — 2? + yV[t)) 7^ on the interval [0, 1]. Under the condition on the eigenvalues of 
M{u), det(/ — Ai{u{t))) 7^ holds for t € [0, 1]. Therefore, Theorem [3 implies the claim. 



4. Main theoretical results 

In this section, we establish the connection between the graph zeta function and the Bethe 
free energy function. These results form a basis of the analyses in later sections. 
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In Subsection 14. H we prove a formula using the Ihara-Bass type determinant formula 
proved in the previous section. The formula shows a concrete relation between the Bethe 
free energy function and the graph zeta function. In Subsection 14. 2^ we give a condition 
that the Hessian of the Bethe free energy function is positive-definite. 

4.1 Bethe-zeta formula 

In this subsection, we show that the determinant of the Hessian of the Bethe free energy 
function is essentially equal to the reciprocal of the graph zeta functiorl^l. 

In order to make the assertion clear, we first recall the definitions and notations. Let 
H = (y, F) be a hypergraph and let I = {£a,£i} be an inference family on H. Exponential 
families Si and £a are given by sufficient statistics 0j and (j)a as discussed in Subsection 12. 2.11 
Furthermore, as discussed in Subsection 12.31 a point ij = G L is identified with a 

set of pseudomarginals {ba{xa),bi{xi)}aeF,iev- 

Theorem 11 At any point of t] = G L the following equality holds. 

CHiu)-^= det(/ -M{u)) = detiV^F) J] det{YanJ<Pa]) J] det{YanM)^~''\ 

where 

uUj ■■= Var5^. [0,] Cov6„ [0, , cj^i] (24) 
is an rj x rj matrix, and V^F is the Hessian matrix with respect to the coordinate {f?(Q,) , 

Note that he Hessian V^F does not depend on the given compatibility functions ^'q because 
those only affect linear terms in F, and thus the formula is a property of inference family 
I. Note also that the determinants of variances in the formula are always positive, because 
we assume all the local exponential families £a smd £i have positive definite covariance 
matrices. 

The proof is based on the Ihara-Bass type determinant formula; we check that the 
Hessian V^-F is related to the matrix (I — P + W) if weights has the form of Eq. (j24p . The 
key condition satisfied on the set L is Var6^[(^j] = Var;, . . 

Proof From the definition of the Bethe free energy function Eq. (jl6p . the (V,V)- block of 
V^-F is given by 

_^^y^^^,^_^,_Si^^ d'F y d\o, 

dr]idi]i ^ dr]idr]i * 3r/j% ' %5r/j ^^jy dijidrjj 

The (V,F)-block and (F,F)-block are given by 

d^F d'^F 9Vo 



Using the diagonal blocks of (F,F)-block, we erase (V,F)-block and (F,V)-block of the 
Hessian by Gaussian elimination. In other words, we choose a square matrix X such that 



An intuitive understanding of t his result, based o n the Legendre duahty of two types of the Bethe free 
energy functions is discussed bv lWatanab3 (|2010l ) 
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det X = I and 



in which 



Y 



+ (1 - di) 



i7:^ il%^^i %^»?(g> V^^(g>^^(g>/ dr]{a)dvj \ ' 



(25) 
(26) 



On the other hand, since ttf_^j := Var^ . ^ Gov, 



, the matrix C/q defined in Eq. ([18 



IS 



f/g = diag(Var[(/)j] G a) Var^^ 



(27) 



Since the matrix Var^^ [(0j)jgcj] is a submatrix of Varfe^[</)o], its inverse can be expressed 
by submatrices of Varf,^ = using the Schur complement formula, which shows 

that the elements of Wa = U^^ is given by 



w 



g^g 

dr]idrij drjidrji^a) \dr](^a)dr](^a) J dr]i^^)drjj 
It fohows from Eq. ([25]), ([261) and ([28]) that 

Y diag (Var g y) = / - P + W, 
where T> and W are defined in Eq. ()20p . Accordingly, we obtain 
C^(it)-i = det(/ - P + W) det ?7g 

g6F 



Varf 



(28) 



det (V^F) n det(Var[0.])-^» J] ^T'^P^^^t" 



det (V^F) J] det(VarbJ</)„]) J] det(Var;,J,^i])i-'^» 



gGF 



where det (Varj,^ [(0i )*£"]) det 



8Va 
dV{a)dV{a) 



det (Var[(/)g]) is used. 



In the rest of this subsection, we rewrite the Bethe-zeta formula in some specific cases. 
Especially, we give explicit expressions of the determinants of the variance matrices. 
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Case 1: Multinomial inference family 

First, we consider the multinomial case. If we take the sufficient statistics of multinomial 
exponential family as in Example [H the determinant of the variance is 



N 



det (Var^ 



llp{k). 



k=l 



Therefore, the theorem reduces to the following form0. 



Corollary 12 (Bethe-zeta formula for multinomial inference family) 

For any {ba{xa),bi{xi)} G L the following equality holds. 



where uf^j 



Varfe, 



Covba[4'jj 4'i\ o.iT' {Nj ~ ^) ^ {^i ~ 1) matrix. 



For binary and pairwise case, this formula is first shown by Watanabe and Fukumizu ( 20091 ). 



Case 2: Fixed-mean Gaussian inference family 

Let G = {V, E) be a graph. We consider the fixed-mean Gaussian inference family on G. 
For a given vector = (/ij)jgy, the inference family is constructed from sufficient statistics 

^ (xj, Xj) = {xi — ^ii){xj — fJ-j). Their expectation parameters 



and 



(l)i{xi) = {xi-mY 

are denoted by ija and ijij, respectively. The variances and covariances are 



Varf 



2??i, Var[(/){ij}] 



2r] 



33 



2t]uVi3 '^Vj3Vi3 Vtj+ViiVjJ 



where (j)i^ijy{xi,Xj) = ((xj - {xj - fijY, (xj - fii){xj - fij)) . Therefore, det(Var j}]) 

'ijj 



Corollary 13 [Bethe-zeta formula for fixed-mean Gaussian inference family ] For any 
{'nn^'Hij} £ L the following equality holds. 



.2.^3 2\y\> 



iev 



ijeE 



where u,.,, := rlf^ri-^ is a scalar value. 



One interesting point of this case is that the edge weights Ui^j are always positive. 



7. Here, we ignore minor constant factors which come from the choices of sufficient statistics 
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4.2 Positive definiteness condition 

In this section, we derive a condition that guarantees the positive-definiteness of the Hessian 
of the Bethe free energy function. It is based on Theorem fTUt which gives a condition that 
the matrix (I — T? + W) is positive definite in terms of the matrix M{u). As we have seen 
in the proof of the previous theorem, V^-F and (/ — P + W) are essentiaUy the same. Thus, 
we obtain the fohowing theorem. 

Theorem 14 Let u he given by r/ £ L using Eq. ^24\ )- Then, 
Spec(7W(it)) C C\M>i =^ ^^F{v) 

is a positive definite matrix. 

Before the proof of the theorem, we remark the following fact. It implies that we can 
change the matrix weight to the correlation coefficient matrices. 

Lemma 15 Let r/ be a point in L, uf_^j be given by Eq. i24\ ), and 

cU, ■■= Corbjc/.,-,^,] = Var6j(/>,]-'/'CovbJ</.,-,0,]VarbJ</.r'/' 
be the correlation coefficient matrix, where corresponds to t}. Then 

Spec(A^('u)) = Spec(7W(c)). (29) 
Proof Define Z by (2)e,e' := ^e,e'^^'^[(t>t{e)?'^'^ ■ Then 

{ZM{u)Z'\^,> = Var[</>t(e)]i/2_^(M)e,e'Var[</.t(,,)]-i/2 = M{c),^e'. 



Proof [Theorem II 4j By definition, cf_^j = c^V^j holds. We choose the operator norm induced 
by the inner product of the vector spaces. In other words, \\X\\ is equal to the maximum 
singular value of X. In this case, it is well known that the norm of a correlation coefficient 
matrix is smaller than 1. From Theorem [TOl the matrix (I — D + W) for the weights 
c = {cf_j.j} is positive definite. 

Next, we compute the matrix (/ — P + W) for the weight c. Similar to Eq. ([28|) . we 
obtain 

Therefore, using the same notations as in the proof of Theorem \TT\ we have 

diag(Var[0i]|i G Vf^^ Y diag (Var[(/)i]|i e V)^^"^ = {L - V + 
This equation implies that Y and V'^F{r]) are positive definite. ■ 



To check the condition of Theorem 1141 we need to analyze the extent of the eigenvalues. 
An easy way for narrowing down the possible region is to bound the spectral radius. For 
a given square matrix X, the spectral radius of X is the maximum of the modulus of the 
eigenvalues; it is denoted by p{X). The following proposition provides a useful bound. The 
proof is given in Appendix IA.2[ 
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Proposition 16 Let u = {u"^^-} he arbitrary matrix weights and let \\u\\ = {||u°Lj>j ||} be 
the scalar weights obtained by an arbitrary operator norm. Then, 

p{M{u))< p{M{\\u\\))< max||<_,.||p(A^) 



5. Analysis of positive definiteness and convexity of BFE 

The Bethe free energy function is not necessarily convex though it is an approximation of 
the Gibbs free energy function, which is convex. Non-conyexity of th e Bethe free e nergy 
can lead to multiple fixed points. Pakzad and Anantharam ( 20021 ) and Heskes ( 20041 ) have 



derived sufficient conditions of the convexity and shown that the Bethe free energy is convex 
for trees and graphs with one cycle. In this section, not only such a global structure, we 
shall focus on the local structure of the Bethe free energy function, i.e. the Hessian. Our 
approach derives the region where the positive definiteness is broken. All the results are 
based on the techniques developed in the previous section. 

In Subsection 15. H as an application of the positive definite condition, we analyze the 
region where the Hessian of Bethe free energy function is positive definite. The Hessian does 
not depend on the given compatibility function, because it appears in the linear part of 
the Bethe free energy function. In Subsection 15.21 we deal with the compatibility functions 
by restricting the Bethe free energy function on a subset S{^) of L. This set consists of the 
pseudomarginals that has natural parameters {^(q,)} and thus includes all the fixed point 
beliefs. We will see that the problem of the uniqueness of the LBP fixed points is reduced 
to the following problem: is the subset S{^) included in the positive definite region of the 
original Bethe free energy function? 



5.1 Region of positive definite and convexity condition 

In this subsection, we simplify Theorem [T3] and explicitly see that if the correlation coef- 
ficient matrices of the pseudomarginals are sufficiently small, then the Hessian is positive 
definite. This "smallness" criteria depends on graph geometry. 

In the following, we choose the operator norm that is equal to the maximum singular 
value. It is well known that the norm of a correlation coefficient matrix is smaller than 1 
under the assumption that the variance-covariance matrix is non-degenerate. 

Corollary 17 (Positive definite region) Let k be the Perron- Frobenius eigenvalue of 
Ai, and define 

L^-i{Z) := {{baixa),bi{xi)} G L{I) | Va G F, Vi,j G a, \\Coj: b^[(pi, 4>j]\\ < k"^} . 
Then, the Hessian V'^F is positive definite on L^-i(X). 

Proof From Proposition [T6l and max||cf_^j|| k < 1, Spec{M{c)) C {A G C| |A| < 1}. 
Therefore, from Theorem 114^ the Hessian is positive definite at the point. ■ 



A bound of the Perron- Frobenius eigenvalue of is given in Subsection lA.li Roughly 
speaking, as the degrees of factors and vertices increase, k also increases and thus i^-i 
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shrinks. The Perron-Frobenius eigenvalue is equal to (resp. 1) if the hypergraph is a tree 
(resp. has a unique cycle). This result suggests that LBP works better for graphs of low 
degree. 

The convexity of F depends solely on the given inference family and the underlying 
hypergraph, because the Hessian V^ F does not depend on the given c ompatibilitv functions, 
^ = For multinomial case, Pakzad and Anantharam ( 20021 ) have shown that the 



Bethe free energy function is convex if the hypergraph has at most one cycle. The following 
theorem extends the result. To show the direction of (i) we have only to analyze the Bethe 
free energy function on trees and one-cycle hypergraphs. To show (ii), however, we need to 
capture the effect of cycles on arbitrary hypergraphs. 

Theorem 18 Let H he a connected hypergraph. 

(i) If n{H) = or 1, then F is convex on L. 

(ii) Assuming the inference family is either a multinomial, Gaussian or fixed-mean Gaus- 
sian, then the converse of (i) holds. 

Proof (i) As we have mentioned above, the Perron-Frobenius eigenvalue a of 7W is equal 
to 1 if n{H) = and if n{H) = 1. Using Corollary 1171 we obtain L^-i = L. Therefore, 
the Bethe free energy function is convex over the domain L. 

(ii) Here we show the proof only in the case of fixed-mean Gaussian. (Other cases are 
proved by a similar way in AppendixlAH) Let G = (V, E) be a graph. For t G [0, 1), define 
•= 1 ^iid riij{t) := t. Accordingly, Ui_^j = t^ and r]{t) ^ L. As t 1, r]{t) approaches 
to a boundary point of L. From Theorem 1271 in Appendix lA.ll 

det(V2F(t))(l - t2)2|i?|+|y|-i ^ 2-I^IZG(t2)-i(l - tY^m+\v\-i 

_^ _2|s|-2|y|+i(|^| _ \v\)k{G) (t ^ 1). 

If n{G) = \E\ — \V\ + 1 > 1, the limit value is negative. Therefore, in a neighborhood of 
the limit point, V'^F is not positive definite. 



5.2 Convexity of restricted Bethe free energy function 

Our analysis so far has not involved the given compatibility function, because it disappears 
in the second derivatives. Not only graph structure, however, but also the compatibility 
functions affect the properties of LBP and the Bethe free energy. 

In this section, we show a method for dealing with the compatibility functions. We 
see that the understanding of the positive definite region helps us to deduce a uniqueness 
condition of LBP. 

5.2.1 Restricted Bethe free energy function 

First, we make simple observations. Since beliefs are given by Eq. (jl0|lip . they must satisfy 
the following condition: for each factor a, there exists {^j^jigo such that 

) cx exp{{9a, (pa) + '^{O'i^, <PiiXi))), 
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where 9a is the natural parameter of ^a- (See Eq. ([6]).) In other words, we can say that all 
the beliefs are always in the following subset of L: 

We can take a coordinate {r]i}i^Y of S{^) because r]a is determined by {r]i}i,=a- We obtain 
a function by restricting F on the set and taking T^Q, clS cir guments. The function is called 
restricted Bethe free energy function and denoted by F. The following proposition says 
that the stationary points of F also correspond to the LBP fixed points. (This fact can be 
also stated as "the fixed points of LBP are the stationary points of the Bethe free energy 
function.") 

Proposition 19 

^^^^^'^^ =0 ^ G S{^) is an LBP fixed point, 

orjj 

Proof Using the chain rule of derivatives, we have 

dF _ OF ^y. dF d7]^a) 

From the definition of F and 5'(^'), we have = on the set 5'(^'). Therefore, all the 
derivatives of F are equal to zero if and only if those of F are zero. ■ 



5.2.2 Convexity condition and uniqueness 

In the following, we analyze the (strict) convexity of the restricted Bethe free energy func- 
tion. Our focus is multinomial models. As a result, we provide a new condition that 
guarantee the uniqueness. From Proposition [T9l the LBP fixed point is unique if F is 
strictly convex. 

From the viewpoint of approximate inference, the uniqueness of LBP fixed point is a 
preferable property. Since LBP algorithm is interpreted as the variational problem of the 
Bethe free energy function, an LBP fixed point that correspond to the global minimum is 
believed to be the best one. If we find the unique fixed point of the LBP algorithm, it is 
guaranteed to be the global minimum of F. 

To understand the convexity of -P, we analyze the Hessian. It turns out that the positive 
definiteness of the Hessian of this function is equivalent to the Hessian of F. Note that the 
Hessian of F is of size "1/" while that of F is of size + F". 

Proposition 20 At any points in the set S{"^), 

V^-F is positive definite <^=^ V^F is positive definite. 
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Proof By taking the derivative of the equation 



dF 



on the set S{^), we obtain 



_d^F_ 



This equation can be written as ( 



drji ■ 



—Xp^pXpy using a notation 



Xv,V Xy^F 



A straightforward computation of the derivatives of F gives V^F = Xyy — Xy^pXp^pXpy, 
where we used the above equation. Since the block Xp^F is always positive definite, the 
statement is obvious. ■ 

If we can verify that the set S{'^) is in the region where V^-F is positive definite, we can 
show that F is convex. Using Theorem [T^ we obtain the following. 



Theorem 21 Define 
W,°(*) :=sup{ llCorfe^ 



) V^J J 



ha{Xa) OC ^aiXa) JJ fii^i), fi pOSiUve fuuctions 0/ 



•ida 



If p{Ai{W)) < 1 then F is strictly convex. Therefore, LBP has the unique fixed point. 

Proof Let r/ be any point in 5'(^'). By definition, HCor^^ || is smaller than W^^. 

From Theorem 1141 and Proposition 116^ V^-F is positive definite at the point. ■ 



In principle, we can compute the weights, W , given the compatibility functions. How- 
ever, it requires optimizations with re spect to /; we can use standard numerical maximiza- 
tion techniques (I Venkataraman . 20091 ). We leave developing efficient methods for computing 



the values for future works. For binary pairwise case, we have a useful formula W^^^ jj 
tanh(|Jjj|), where '^ij{xi,Xj) oc e"^'^ 



( Watanabe and Fukumizu . 20091 ). 



Theorem 1211 holds for an arbitrary LBP. However, for Gaussian cases we obtain W^^^ = 
1, yielding no meaningful implications. Therefore, in the rest of this section, we focus on 
multinomial cases. 



5.2.3 Comparison to Mooij's condition 

For multinomial mod els, the r e are several works that give sufficient conditions for the 
uniqueness property. iHeskej jiooi) analyzed the uniqueness problem by considering an 
equivalent min-max problem. Other authors analyzed the convergence property rather 
than the uniqueness. LBP algorithm is said to be convergent if the messages converge 
to the unique fixed point irrespectiv e of the initial messages. By definition, this prop- 



erty is stronger than the uniqueness. iTatikonda and JordanI (j2002l ) utilized the theory of 
Gibbs measure, and showed that the uniqueness of the Gibbs measure implies the con- 
vergence of LBP algorithm. Therefore, known sufficient conditions of the uniqueness of 
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the Gibbs measure are th at of the convergence of LBP algorithm. Ilhler et al. and 
Mooii and Kappenl (|2007l ) derived sufficient conditions for the convergence by investigating 
conditions that make the LBP update a contraction; for pairwise case, their conditions are 
essentially the same. 

We compare our condition with Mooij's condition. One reason is that this condition 
is directly applicable to factor graph models, while Ihler's and Tatikonda's conditions 
are for written for pairw ise models. Another reason is that numerical experiments by 
Mooii and Kappen (I2OO7I ) suggests that Mooij 's condition is f ar su perior to the condition 
of Heskes. (See numerical experiments in (jMooij and Kappenl . 120071 ).) 

The Mooij's condition is stated as follows. 



Theorem 22 (jMooii and Kapp enl ((20071)) Define 



sup sup sup tanh ( - log 



a\{i, j}. 



If p{M.{N)) < 1, then LBP is convergent. Therefore, LBP has a unique fixed point. 



Literestingly, this condition looks similar to our Theorem both of them are stated 
in terms of the spectral radius of the directed edge matrix, Ai, with weights. Comparison 
of these condition is reduced to that of Wjj(^'a) and Nij{'^a)- (Recall that for positive 
matrices X and Y, p{X) < p{Y) if Xij < Yij. ) For binary pairwise case, the conditions 
coincide; it is not hard to check that Wij = Nij = tanh(| Jjj|). 

By numerical computation, we conjecture that Wjj(^'a) < Nij^'^a) always holds. In 
Figure [71 we show a plot for the case of ^'(xi,X2,X3) = exp(i^xi2;2X3 + O.S^XjXj), where 
Xi G {il}- We observe that W and coincides for large \K\, but W is strictly smaller 
than N for small \K\. 

Next, we compare conditions of Theorem EH [22] and the actual LBP convergence region. 
We run the LBP algorithm on the 3x3 square grid of cyclic boundary condition, where 
the factors correspond to the vertices of the grid and variables are on the edges. Thus, the 
degree of factors is four and that of vertices is two. The variables are binary (xj G {if}) and 
compatibility functions are given in the form of ^{xi, X2, X3, X4) = exp{K X^j^j^;, XiXjXk + 
JJ2i<j changed the parameters K and J. All the messages are initialized to 

constant functions and updated in parallel by Eq. ([9]) . The result is plotted in Figure [71 
We judge LBP is convergent if message change is smaller than 10~^ after 30 iterations. We 
observe that there is a triangle region where uniqueness is guaranteed but LBP does not 
converge. 



6. Analysis of stability of LBP 

In this section, we analyze relations between the local stability of LBP and the local structure 
of the Bethe free energy around an LBP fixed point. Since LBP is not the gradient descent 
of the Bethe free energy function, such a relation is not necessarily obvious. From the 
view poi nt of the varia t ional formulation, we hope to find the minima. In the celebrated 
paper bv lYedidia et al. I (|200lh . they empirically found that locally stable LBP fixed points 
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Figure 7: Left: Comparison of W and N. Solid line is the plot of W and dashed line is 
A^. Right: Inside the dashed line region, LBP is guaranteed to converge by the 
Mooij's condition. Inside the solid line, LBP is guaranteed to have the unique 
fixed point by Theorem [2TJ In the shaded region, LBP does not converges. 



are local minima of the Bethe free energy function; iHesked (|2002l ) have shown the fact for 
multinomial case. 

In the following, we extend the result to two directions. First, we derive the conditions of 
the local stability and local minimality in terms of the eigenvalues of the matrix M (u) , which 
immediately implies the above fact. Secondly, the result is extended to LBPs formulated 
by inference family including both multinomial and Gaussian cases. This is possible, since 
our analysis is based on the techniques developed in Section HI 

6.1 LBP as a dynamical system 

First, we regard the LBP update as a dynamical system. At each time t, the state of the 
algorithm is specified by the set of messages which is identified with its natural 

parameters /x* = {/U^_^j} G M^. In terms of the parameters, the update rule Eq. ([9]) is 
written as follows. 

where a = {ii, . . . ,id^}, da = k and Aa{- ■ ■ )j is the i-th component {i £ a). To obtain this 
equation, after multiply Eq. ([9]) by 

normalize it to be a probability density function, and then take the expectation of (pi. 

Formally, this update rule can be viewed as a transform T on the set of natural param- 
eters of messages M: 

T : M — > M, /I* = T(/x*-i). 
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LBP algorithm can be formulated as repeated applications of this map. In this formulation, 
the fixed points of LBP are {/x* G M|/x* = T(/x*)}. 

Here we compute the d ifferentiation of th e update map T around an LBP fixed point. 
This expression derived by llkeda et aD tooi ) for the cases of turbo and LDPC codes. 

Theorem 23 (Differentiation of the LBP update) At an LBP fixed point, the differ- 
entiation (linearization) of the LBP update is 

dT{^x)a^i _ iVavb^ [4'i]~^Covb^ if j e Na ^ i and ^ e Nj \ a, 

d^fi^j 1 otherwise. 

In other words, at an LBP fixed point rj €z L, the differentiation ofT is 

T' = M{u), 

where u = {uf^-} is given by Eq. (24^ . 

Proof First, consider the case that j G \ i and /3 € A'j- \ a. The derivative is equal to 

Orji Ofia:j 

Another case is i = j and a,f3 £ Ni (a 7^ Then, the derivative is 

d-qi d9a:i 

because Varf,, [(pi] = Varf,^ [(pi] from Eq. (jl3|) . In other cases, the derivative is trivially zero. ■ 



The relation j £ \ i and (3 € Nj ^ a will be written as (/3 — t- j) ^ (q — t- i) in 
Subsection 13.11 It is noteworthy that the elements of the linearization matrix is explicitly 
expressed by the fixed point beliefs. 



6.2 Spectral conditions 

Let T be the LBP update map. A fixed point fi* is called locally stabl^ if LBP starting 
with a point sufficiently close to /x* converges to n* . To suppress oscillatory behaviors of 
LBP, damping of update := (1 — e)r + el is sometimes useful, where < e < 1 is a 
damping strength and / is the identity matrix. 

As we will summarize in the following theorem, the local stability is determined by the 
linearization T' at the fixed point. Since T' is nothing but M{u) at an LBP fixed point. 
Theorem [14] implies relations between the local stability and the Hessian of the Bethe free 
energy function. 

Theorem 24 Let /x* be an LBP fixed point and assume that T'{^*) has no eigenvalues of 
unit modulus for simplicity. Then the following statements hold. 

8. This property is often referred to as asymptotically .stofe/e [Guckenheimer and Holme j (|l990l ). 
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1. Spec(T'(/i*)) C {A G C||A| < 1} LBP is locally stable at fx* . 

2. Spec(T'(/i*)) C {A G C|ReA < 1} <^=^ LBP is locally stable at /i* with some damping. 

3. Spec(T'(/x*)) C C \ M>i ^ n* is a local minimum of the Bethe free energy function. 

Proof 1. : This is a standard result. (See Guckenheimer and Holniei ( 19901 ) for exam- 
ple.) 2. : There is an e E [0, 1) that satisfy Spec(r,'(/2*)) C {A G C||A| < 1} if and only if 
Spec(T'(/2*)) C {A E C|ReA < 1}. 3. : This assertion is a direct consequence of Theorem 
[IlandEi ■ 



This theorem immediately implies that a locally stable LBP fixed point is a local mini- 
mum of the Bethe free energy. The theorem applies to both the multinomial and Gaussian 
cases. 

It is interesting to ask under which condition a local minimum of the Bethe free energy 
function is a locally stable fixed point of (damped) LBP. An implicit reason for the empirical 
success of the LBP algorithm is that LBP finds a "good" local minimum rather than a local 
minimum nearby the initial point. The theorem gives a new insight to the question, i.e., 
the difference between the stable local minima and the unstable local minima in terms of 
the spectrum of T'{fM*). 

6.3 Special cases: gaps between stability and local minimality 

Here we focus on two special cases: binary pairwise attractive models and pairwise fixed- 
mean Gaussian models. Note that a binary pairwise graphical model ^ = {^jj, ^i} is called 
attractive if Jij > 0, where ^j(xj) = exp(/ijXi) and '^ij{xi,Xj) = ex.p{JijXiXj) {xi,Xj G 
{±1}). In these cases, the stable fixed points of LBP and the local minima of Bethe free 
energy function are less different. 

Consider the following situation: we have continuously parametrized compatibility func- 
tions {^'jj(i), ^'j(t)}t>o, which are constants at t = (e.g. i is a inverse temperature: 
^jj(t) = exp{tJijXiXj) and ^i{t) = exp{thiXi)). Starting from i = 0, we run LBP algorithm 
for t, find a stable fixed point and use it as initial messages of LBP for t + St, where 6t is 
a sufficiently small positive number. Then we obtain a trajectory of a stable fixed point 
beliefs: we call it a belief trajectory. It first continuously follow the local minima and then 
it may jump to another stable fixed point belief at t = tg- The following theorem implies 
that the stable fixed point becomes unstable by continuous changes of the compatibility 
functions exactly when the corresponding local minimum becomes a saddle. 

Theorem 25 Suppose that we have a continuously parametrized compatibility functions of 
attractive binary pairwise model or fixed-mean Gaussian model as above. If the LBP fixed 
point becomes unstable across t = to for the first time following the belief trajectory, then the 
corresponding local minimum of the Bethe free energy becomes a saddle point across t = to- 

Proof First consider the case of attractive binary pairwise models. From Eq. (jlip . we 
see that bij{xi,Xj) tx exp{JijXiXj + OiXi + OjXj) for some 6i and 9j. From Jij > 0, we 
have Covbij[xi,Xj] > 0, and thus Ui^j > 0. When the LBP fixed point becomes unstable, 
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the Perron-Frobenius eigenvalue of Ai{u) goes over 1, which means det(/ — M{u)) crosses 
0. From Theorem II H we see that det(V^F) becomes positive to negative at t = to- The 
Gaussian case can be proved analogously. Recall that the weight Ui^j are always positive 
scalars as shown in Corollary [T3j ■ 



Theorem 1251 extends Theorem 2 of iMooii and KappenI (j2005l ). which discusses only the 
case of binary pairwise models with vanishing local fields hi = and the trivial fixed point 
(i.e. Ebjxi] = 0). 



7. Summary and discussions 

We have established a connection between graph zeta function, Bethe free energy and loopy 
belief propagation. We have shown that this connection provides powerful tools for the 
analysis of Bethe free energy and LBP; key theorems are given in Section HI In Section [5l 
based on the theorems, we analyzed the (non) convexity of the Bethe free energy function. 
Roughly speaking, the positive definite region of Bethe free energy functions shrinks as the 
Perron-Frobenius eigenvalue of the directed edge matrix becomes large, or equivalently, as 
the pole of the Ihara zeta function closest to the origin approaches to zero. We have shown 
that such knowledge can be used to derive the uniqueness property of LBP. In Section [6l 
we have shown that the local stability of LBP implies local minimality of Bethe free energy 
as long as LBP is well defined within a class of exponential families. A key observation is 
that the matrix M(u) is equal to the linearization of the LBP update at LBP fixed points. 

The Bethe-zeta formula shows that the Bethe free energy function contains information 
on the graph geometry, especially on the prime cycles. The formula helps extract graph 
information from the Bethe free energy function. For example we observed that the number 
of the spanning trees are derived from a limit of the Bethe free energy function. In a sense, 
the connection between those three objects seems to be natural as all of them becomes 
"trivial" if the associated graph structure is a tree. If the associated hypergraph is a tree, 
zeta function is equal to 1, Bethe free energy function is equal to the Gibbs free energy 
function and LBP reduces to the original BP, which computes exact marginals in finite 
steps. 



7.1 Path forw^ard 

In this subsection, we list a few directions of future researches going beyond the results of 
this paper. 

In a sequel paper ( Watanabe and Fukumizu . 20 111 ), we further exploit the connection 



between LBP, Bethe free energy and graph zeta function to analyze the LBP fixed point 
equation, focusing on binary pairwise models. We characterize the class of signed graph on 
which uniqueness of the LBP fixed point is guaranteed. Note that the signs on the edges 
represents those of the interactions (i.e. sgn Jjj). The condition is contrast to the those of 
the past researches and the result in Section [U where the strength of interactions (i.e. \Jij\) 
are bounded. 

In Subsection 15.21 we have derived a condition for the convexity for the restricted Bethe 
free energy function. Unfortunately, the expression of the weight W involves sup operator 
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and does not easy to compute directly. We need further consideration to find a way of 
compute it more easily. The proof of the conjecture W < N is also an interesting problem. 

The connection between graph zeta, Bethe free energy and LBP can be extended 
to a more general class of free energies includ i ng fra ctional and tree-reweighted types 
( Wiegerinck and Heskes . 2003 : Wainwright et al. . 2003bl ). These free energies are obtained 
by modifying the coefficients in the definition of the Bethe free energy function. The cor- 
responding gr aph zeta function then becon ies the Bartholdi type, which allows cycles with 
backtracking (|Bartholdil . Il999l : llwaol . l2006l ^. The relation may be useful to analyze such 
class of free energies. 
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Appendix A. 

A.l Miscellaneous properties of one- variable hypergraph zeta function 

This subsection provides miscellaneous facts related to the one-variable hypergraph zeta 
functions. In the analyses of this paper, we sometimes reduce the multivariate zeta to the 
one-variable zeta. Therefore, it is important to understand the one-variable hypergraph 
zeta (Hiu) and the directed edge matrix Ad. 

Recall that p{X) denotes the spectral radius of X. We have the following bounds on 
the spectral radius of Ai. 

Proposition 26 For e G E, let ke '■= \{e' G E;e' ^ e}\, km = minA;e and k^j = maxke- 
Then 

km < p{M) < ku- 



Therefore, if H is a graph, 



min 

i&V 



di - I < p(M) < maxdj - 1. 



(30) 



Proof Since k^, = M.e,e', the bound is trivial from the easy bound on th e spectral radius 
of non- negative matrices. See Theorem 8.1.22 of lHorn and JohnsonI (|l99d ). ■ 



Since the directed matrix A4 is non-negative, the spectral radius is equal to the Perron- 



> k-n 



For the 



Frobenius eigenvalue. The pole of closest to the origin is u = p{My ^ n-^j . 
case of Ihara 's zeta function, a bound o n the modulus of imaginary poles as well as Eq. (j30p 
are given by Kotani and Sunadal ( 200d ). 

For arbitrary hypergraph, Ch{u) has a pole at = 1 because det(/ — M.) = 0. The 
following theorem giv e s the r nultip licity of the pole. The original version of this theorem is 



loilowmg tneorem giv e s tne r nuitip 
proved by iHashimotol (| 19891 . ll99ol y 



Theorem 27 (Hypergraph Hashimoto's theorem ( Hashimoto . 1989 ; Storm . 20061 )) 

Let x{H) :=\V\ + \F\ — \E\ he the Euler number of H . 



U] 



where k{Bh) is the number of spanning trees of the bipartite graph Bh- (Bh is the bipartite 
graph representation of the hypergraph H.) 



Proof For a graph G = (V, E), IHashimotol (|l989l . Il990l ') proved that 

lim ZGiu)-\l - = -2l^l"l^l+i(|i?| - \V\)KiG), 



where k{G) is the number of spanning tree of G. A simple proof is given by iNorthshield 



( 19981 ). Since there is a one-to-one correspondence between and ^b„, we have C,h{u) 
Zb„{\/u). Then the assertion is proved from the above formula. 
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A. 2 Detailed Proofs 

Proof of Theorem [4] The conditions for stationary points of the Bethe free energy 
function are (9^^) = 6*^^) and J2aBii~^ct:i + Oa-.i) + (1 - di)6i = 0. 

(1. =^ 2.) The correspondence from the fixed point message to the stationary point is given 
by Eqs. pO|lip . Prom this construction, we see that 

a£F a i 

This imphes the above stationary point conditions. 

(2. =^ 1.) The converse correspondence is given by ma^i{xi) = exp((6'i + 9a-,i — Oa:i,(pi)), 
where {Oa,Oi} are the natural parameters of the stationary point pseudomarginals {ba{xa), 
From this construction and the stationary point conditions, we have 

JJ mi3^i{xi) = exp((6'j,</>i(xi))) oc bi{xi), 

'^aiXa)Y\_ n = 6^P((^(a>''?^(a>(^")) + {0a:i,<Pi)) « ba{Xa). 

tea l3eNi-^a i€Na 

Therefore, the local consistency condition Eq. ()13p implies that 

m^_^i(xi) oc / *qJJ Y\. 'm'l3^j{Xj)dUa-^i. 

This is equivalent to the LBP fixed point equation. 

□ 

Proof of Theorem [6] The foll owing proof proceeds in an analogous manner with The- 
orem 3 in Stark and Terras (jl996l ). First define a differential operator 



where {ue'^e)ae,a^, denotes the (oe^fle') element of the matrix Ue'^e- If we apply this 
operator to a /c product of u terms, it is multiplied by k. Since log CniO) = and logdet(/ — 
A^(0))"^ = 0, it is enough to prove that TilogCHiu) = ?^logdet(/ - M{u)Y^ . Using 
equations logdetX = trlogX and — log(l — x) = Ylik>i ^ have 

mogCH{u) = 'H -logdet(/-7r(p)) 

= ^ E E ^r^(p)') (31) 

PG<Pj/ k>l 

= E ElPl*^(^(P)') (32) 

pG'Pii k>l 

Y tr(7r(C)) =Y^riMiu)'). 

Cxlosed geodesic k>l 
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From Eq. ()3ip to Eq. ()32p . notice that % acts as a multiplication of fe|p| for each summand. 
This is because the summand is a sum of degree k\^\ terms counting each {U(,'^e)ae,a^, degree 
one. 

On the other hand, one easily observes that 

niogdetil - M{u))-^ =n^^ti{M{u)'') 

k>l 
k>l 

Thus, the proof is completed. 

□ 

Proof of Theorem [7| 

The proof is based on the decomposition in the following lemma and determinant ma- 
nipulations. We define a linear operator by 

r:X(y)^X(^), {Tg){e):=g{t{e)) 



The vector spaces X{E) and XiV) have inner products naturally. We can think of the 
adjoint of T which is given by 

T* : X{E) ^ X{V\ (r*/)(i):= ^ /(e). 

e:i(e)=i 

These linear operators have the following relation. 
Lemma 

M{u) = l{u)TT* - l{u) 
Proof [Proof of Lemma] Let / G X{V). 

\{u)TT*-.{u))f{e)= Y: <tS)^tie) E E 4")^.(e)/(«") 

..s(e') = s{e) e":t{e")=t{e') „„. s{e")^s(e) 

" ■ t(e')#t(e) • t{e")^t{e) 



E <eVt(e) E 



*t{e')^t(e) 

{M{u)f){e). 



Using this lemma, we have 

Cg{u)-' = det{I - M{u)) 

= - l{u)TT* + l{u)) 

= det(/ - l{u)TT*{I + l{u))-'^) det((/ + l{u))) 

= det{Iry -T*{I + L{u))-h{u)T) Jl detiUa) 

a&F 
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It is easy to see that Iry -T*{I + l{u))-^l{u)T = Iry - T*T + T*{I + l{u))-^T. We also 
see that 

e:t(e)=i 

and 

{T*{I + L{u))-^Tgm = ^ {{1 + L{u))-^Tg){e) = {Wg^. 

e:t(e)=i 

□ 



Proof of Proposition 1161 The right inequahty is obvious. We prove the left inequality. 
Let C = p{Ai{\\u\\)). It is enough to prove that det(/ — zM.{u)) has no root in {A € 
C| |A| < C~^}. Accordingly, we show that (^h{zu) has no pole in the set. Let p be a prime 
cycle and let Ai, . . . , A,, be the eigenvalues of 7r(p; u). Then we obtain max jA^j < 7r(p; \\u\\). 
Therefore, if \z\ < 7r(p; ||it||)~^, we have 



det(/-zlPl^(p;-it)) 



> (l-|z|lPl^(p;||w||) 



It is not difficult to see that, for arbitrary prime cycle p, an inequality C "^<7r(p;||M||) ^ 
holds. Therefore, if 1^1 < C~^, 



\CHizu)\ 



J|det(/-^|P'7r(p;tx))-i 

PGP 



iPl 



7r(p; 



u 



)) '' = c//(i^iiit^iir <oo. 



□ 



Proof of Theorem 1181 (ii) : Multinomial case First, we consider binary case, i.e. 
<t>i{xi) = Xi £ {±1}- For t £ [0,1], let us define r]ij{t) = Ei,^[xiXj] = t and r]i{t) = 0. 
Accordingly, u'?_^j = t and r]{t) £ L. As t 1, r]{t) approaches to a boundary point of 
L. Using Theorem I27| analogous to the fixed-mean Gaussian case, we see that det{V^F{t)) 
becomes negative as t — >• 1 if n{H) > 1. Therefore, F is not convex on L. 

For general multinomial inference families, the non convexity of F is deduced from 
the binary case. There is a face of (the closure of) L that is identified with the set of 
pseudomarginals of the binary inference family on the same hypergraph. Since OlogO = 0, 
we see that the restriction of F on the face is the Bethe free energy function of the binary 
inference family. Since this restriction is not convex, F is not convex. 



□ 
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